SQL Azure : Design Patterns (part 2) - Sharding

12/11/2010 9:28:32 AM

4. Sharding

So far, you've seen patterns that implement a single connection at a time. In a shard (see Figure 4 ), multiple databases can be accessed simultaneously in a read and/or write fashion and can be located in a mixed environment (local and cloud). However, keep in mind that the total availability of your shard depends partially on the availability of your local databases.

Shards are typically implemented when performance requirements are such that data access needs to be spread over multiple databases in a scale-out approach.

Figure 4. Shard pattern

4.1. Shard Concepts and Methods

Before visiting the shard patterns, let's analyze the various aspects of shard design. Some important concepts are explained here:

Decision rules. Logic that determines without a doubt which database contains the interesting record(s). For example, if Country = US, then connect to SQL Azure Database #1. Rules can be static (hardcoded in C#, for example) or dynamic (stored in XML configuration files). Static rules tend to limit the ability to grow the shard easily, because adding a new database is likely to change the rules. Dynamic rules, on the other hand, may require the creation of a rule engine. Not all shard libraries use decision rules.
Round-robin. A method that changes the database endpoint for every new connection (or other condition) in a consistent manner. For example, when accessing a group of five databases in a round-robin manner, the first connection is made to database 1, the second to database 2, and so on. Then, the sixth connection is made to database 1 again, and so forth. Round-robin methods avoid the creation of decision engines and attempt to spread the data and the load evenly across all databases involved in a shard.
Horizontal partition. A collection of tables with similar schemas that represent an entire dataset when concatenated. For example, sales records can be split by country, where each country is stored in a separate table. You can create a horizontal partition by applying decision rules or using a round-robin method. When using a round-robin method, no logic helps identify which database contains the record of interest; so all databases must be searched.
Vertical partition. A table schema split across multiple databases. As a result, a single record's columns are stored on multiple databases. Although this is considered a valid technique, vertical partitioning isn't explored in this book.
Mirrors. An exact replica of a primary database (or a large portion of the primary database that is of interest). Databases in a mirror configuration obtain their data at roughly the same time using a synchronization mechanism like SQL Data Sync. For example, a mirror shard made of two databases, each of which has the Sales table, has the same number of records in each table at all times. Read operations are then simplified (no rules needed) because it doesn't matter which database you connect to; the Sales table contains the data you need in all the databases.
Shard definition. A list of SQL Azure databases created in a server in Azure. The consumer application can automatically detect which databases are part of the shard by connecting to the master database. If all databases created are part of the shard, enumerating the records in sys.databases gives you all the databases in the shard.
Breadcrumbs. A technique that leaves a small trace that can be used downstream for improved decisions. In this context, breadcrumbs can be added to datasets to indicate which database a record came from. This helps in determining which database to connect to in order to update a record and avoids spreading requests to all databases.

When using a shard, a consumer typically issues CRUD (create, read, update, and delete) operations. Each operation has unique properties depending on the approach chosen. Table 1 outlines some possible combinations of techniques to help you decide which sharding method is best for you. The left column describes the connection mechanism used by the shard, and the top row identifies the shard's storage mechanism.

Table 1. Shard access techniques
	Horizontal partitions	Mirror
Decision rules	Rules determine how equally records are spread in the shard. Create: Apply rules. Read: Connect to all databases with rules included as part of a WHERE clause, or choose a database based on the rules. Add breadcrumbs for update and delete operations. Update: Apply rules or use breadcrumbs, and possibly move records to another database if the column updated is part of the rule. Delete: Apply rules, or use breadcrumbs when possible.	This combination doesn't seem to provide a benefit. Mirrored databases aren't partitioned, and so no rule exists to find a record.
Round-robin	Records are placed randomly in databases based on the available connection. No logic can be applied to determine which database contains which records. Create: Insert a record in the current database. Read: Connect to all databases, issue statements, and concatenate resultsets. Add breadcrumbs for update and delete operations. Update: Connect to all databases (or use breadcrumbs), and apply updates using a primary ke. Delete: Same as update.	All records are copied to all databases. Use a single database (called the primary database) for writes. Create: Insert a record in the primary database only. Read: Connect to any database in a round-robin fashion. Update: Update a record in the primary database only. Delete: Delete a record in the primary database only.

Shards can be very difficult to implement. Make sure you test thoroughly when implementing shards. You can also look at some of the shard libraries that have been developed. The shard library found on CodePlex and explained further in this article uses .NET 4.0; you can find it with its source code at http://enzosqlshard.codeplex.com . It uses round-robin as its access method. You can also look at another implementation of a shard library that uses SQLAzureHelper; this shard library uses decision rules as its access method and is provided by the SQL Azure Team (http://blogs.msdn.com/b/sqlazure/).

4.2. Read-Only Shards

Shards can be implemented in multiple ways. For example, you can create a read-only shard (ROS). Although the shard is fed from a database that accepts read/write operations, its records are read-only for consumers.

Figure 5 shows an example of a shard topology that consists of a local SQL Server to store its data with read and write access. The data is then replicated using the SQL Data Sync framework (or other method) to the actual shards, which are additional SQL Azure databases in the cloud. The consuming application then connects to the shard (in SQL Azure) to read the information as needed.

Figure 5. Read-only shard topology

In one scenario, the SQL Azure databases each contain the exact same copy of the data (mirror shard), so the consumer can connect to one of the SQL Azure databases (using a round-robin mechanism to spread the load, for example). This is perhaps the simpler implementation because all the records are copied to all the databases in the shard blindly. However, keep in mind that SQL Azure doesn't support distributed transactions; you may need to have a compensating mechanism in case some transactions commit and others don't.

Another implementation of the ROS consists of synchronizing the data using horizontal partitioning. In a horizontal partition, rules are applied to determine which database contains which data. For example, the SQL Data Synch service can be implemented to partition the data for US sales to one SQL Azure database and European sales to another. In this implementation, either the consumer knows about the horizontal partition and knows which database to connect to (by applying decision rules based on customer input), or it connects to all databases in the cloud by applying a WHERE clause on the country if necessary, avoiding the cost of running the decision engine that selects the correct database based on the established rules.

4.3. Read-Write Shards

In a read-write shard (RWS), all databases are considered read/write. In this case, you don't need to use a replication topology that uses the SQL Data Sync framework because there is a single copy of each record within the shard. Figure 6 shows a RWS topology.

Although a RWS removes the complexity of synchronizing data between databases, the consumer is responsible for directing all CRUD operations to the appropriate cloud database. This requires special considerations and advanced development techniques to accomplish, as previously discussed.